1,674 research outputs found
DeepFactors: Real-time probabilistic dense monocular SLAM
The ability to estimate rich geometry and camera motion from monocular imagery is fundamental to future interactive robotics and augmented reality applications. Different approaches have been proposed that vary in scene geometry representation (sparse landmarks, dense maps), the consistency metric used for optimising the multi-view problem, and the use of learned priors. We present a SLAM system that unifies these methods in a probabilistic framework while still maintaining real-time performance. This is achieved through the use of a learned compact depth map representation and reformulating three different types of errors: photometric, reprojection and geometric, which we make use of within standard factor graph software. We evaluate our system on trajectory estimation and depth reconstruction on real-world sequences and present various examples of estimated dense geometry
RIDI: Robust IMU Double Integration
This paper proposes a novel data-driven approach for inertial navigation,
which learns to estimate trajectories of natural human motions just from an
inertial measurement unit (IMU) in every smartphone. The key observation is
that human motions are repetitive and consist of a few major modes (e.g.,
standing, walking, or turning). Our algorithm regresses a velocity vector from
the history of linear accelerations and angular velocities, then corrects
low-frequency bias in the linear accelerations, which are integrated twice to
estimate positions. We have acquired training data with ground-truth motions
across multiple human subjects and multiple phone placements (e.g., in a bag or
a hand). The qualitatively and quantitatively evaluations have demonstrated
that our algorithm has surprisingly shown comparable results to full Visual
Inertial navigation. To our knowledge, this paper is the first to integrate
sophisticated machine learning techniques with inertial navigation, potentially
opening up a new line of research in the domain of data-driven inertial
navigation. We will publicly share our code and data to facilitate further
research
Simultaneous Optical Flow and Intensity Estimation from an Event Camera
Event cameras are bio-inspired vision sensors which mimic retinas to measure per-pixel intensity change rather than outputting an actual intensity image. This proposed paradigm shift away from traditional frame cameras offers significant potential advantages: namely avoiding high data rates, dynamic range limitations and motion blur. Unfortunately, however, established computer vision algorithms may not at all be applied directly to event cameras. Methods proposed so far to reconstruct images, estimate optical flow, track a camera and reconstruct a scene come with severe restrictions on the environment or on the motion of the camera, e.g. allowing only rotation. Here, we propose, to the best of our knowledge, the first algorithm to simultaneously recover the motion field and brightness image, while the camera undergoes a generic motion through any scene. Our approach employs minimisation of a cost function that contains the asynchronous event data as well as spatial and temporal regularisation within a sliding window time interval. Our implementation relies on GPU optimisation and runs in near real-time. In a series of examples, we demonstrate the successful operation of our framework, including in situations where conventional cameras suffer from dynamic range limitations and motion blur
Pairwise Decomposition of Image Sequences for Active Multi-View Recognition
A multi-view image sequence provides a much richer capacity for object recognition than from a single image. However, most existing solutions to multi-view recognition typically adopt hand-crafted, model-based geometric methods, which do not readily embrace recent trends in deep learning. We propose to bring Convolutional Neural Networks to generic multi-view recognition, by decomposing an image sequence into a set of image pairs, classifying each pair independently, and then learning an object classi- fier by weighting the contribution of each pair. This allows for recognition over arbitrary camera trajectories, without requiring explicit training over the potentially infinite number of camera paths and lengths. Building these pairwise relationships then naturally extends to the next-best-view problem in an active recognition framework. To achieve this, we train a second Convolutional Neural Network to map directly from an observed image to next viewpoint. Finally, we incorporate this into a trajectory optimisation task, whereby the best recognition confidence is sought for a given trajectory length. We present state-of-the-art results in both guided and unguided multi-view recognition on the ModelNet dataset, and show how our method can be used with depth images, greyscale images, or both
Learning to complete object shapes for object-level mapping in dynamic scenes
In this paper, we propose a novel object-level mapping system that can simultaneously segment, track, and reconstruct objects in dynamic scenes. It can further predict and complete their full geometries by conditioning on reconstructions from depth inputs and a category-level shape prior with the aim that completed object geometry leads to better object reconstruction and tracking accuracy. For each incoming RGB-D frame, we perform instance segmentation to detect objects and build data associations between the detection and the existing object maps. A new object map will be created for each unmatched detection. For each matched object, we jointly optimise its pose and latent geometry representations using geometric residual and differential rendering residual towards its shape prior and completed geometry. Our approach shows better tracking and reconstruction performance compared to methods using traditional volumetric mapping or learned shape prior approaches. We evaluate its effectiveness by quantitatively and qualitatively testing it in both synthetic and real-world sequences
Marker based Thermal-Inertial Localization for Aerial Robots in Obscurant Filled Environments
For robotic inspection tasks in known environments fiducial markers provide a
reliable and low-cost solution for robot localization. However, detection of
such markers relies on the quality of RGB camera data, which degrades
significantly in the presence of visual obscurants such as fog and smoke. The
ability to navigate known environments in the presence of obscurants can be
critical for inspection tasks especially, in the aftermath of a disaster.
Addressing such a scenario, this work proposes a method for the design of
fiducial markers to be used with thermal cameras for the pose estimation of
aerial robots. Our low cost markers are designed to work in the long wave
infrared spectrum, which is not affected by the presence of obscurants, and can
be affixed to any object that has measurable temperature difference with
respect to its surroundings. Furthermore, the estimated pose from the fiducial
markers is fused with inertial measurements in an extended Kalman filter to
remove high frequency noise and error present in the fiducial pose estimates.
The proposed markers and the pose estimation method are experimentally
evaluated in an obscurant filled environment using an aerial robot carrying a
thermal camera.Comment: 10 pages, 5 figures, Published in International Symposium on Visual
Computing 201
MonoSLAM: Real-time single camera SLAM
Published versio
Bundle adjustment on a graph processor
Graph processors such as Graphcore's Intelligence Processing Unit (IPU) are part of the major new wave of novel computer architecture for AI, and have a general design with massively parallel computation, distributed on-chip memory and very high inter-core communication bandwidth which allows breakthrough performance for message passing algorithms on arbitrary graphs. We show for the first time that the classical computer vision problem of bundle adjustment (BA) can be solved extremely fast on a graph processor using Gaussian Belief Propagation. Our simple but fully parallel implementation uses the 1216 cores on a single IPU chip to, for instance, solve a real BA problem with 125 keyframes and 1919 points in under 40ms, compared to 1450ms for the Ceres CPU library. Further code optimisation will surely increase this difference on static problems, but we argue that the real promise of graph processing is for flexible in-place optimisation of general, dynamically changing factor graphs representing Spatial AI problems. We give indications of this with experiments showing the ability of GBP to efficiently solve incremental SLAM problems, and deal with robust cost functions and different types of factors
Monocular, Real-Time Surface Reconstruction using Dynamic Level of Detail
We present a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. As a monocular camera browses a scene, our algorithm processes images as they arrive and incrementally builds a detailed surface model. While most of the existing reconstruction approaches rely on volumetric or point-cloud representations of the environment, we perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method relies on least-squares optimisation, which enables a probabilistically sound and principled formulation of the fusion algorithm. We demonstrate that our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient
- …